THE GEOMETRIC MEAN OF TWO MATRICES 
FROM A COMPUTATIONAL VIEWPOINT 

BRUNO IANNAZZO* 

Abstract. The geometric mean of two matrices is considered and analyzed from a computational 
viewpoint. Some useful theoretical properties are derived and an analysis of the conditioning is 
performed. Several numerical algorithms based on different properties and representation of the 
geometric mean are discussed and analyzed and it is shown that most of them can be classified in 
terms of the rational approximations of the inverse square root functions. A review of the relevant 
applications is given. 
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1. Introduction. The geometric mean of two positive numbers a and b is defined 
as \/ab. The adjective "geometric" is referred to the fact that the geometric mean is 
the length of the edge of a square having the same area as a rectangle whose edges 
have length a and b, respectively. 

A typical wish in mathematics is to generalize concepts as much as possible. It 
is then understood why researchers have tried to generalize the concept of geometric 
mean to the matrix generalizations of positive numbers, namely Hermitian positive 
definite matrices. We denote by V n the set ofnxn Hermitian positive definite ma- 
trices, which we will call just positive matrices. The geometric mean of two matrices 
need to be a function <p :V n X V n — > V n . 

The generalization is not trivial, since the formula Vab, applied to matrices would 
lead to the definition ip(A,B) := {AB) 1 / 2 , which is unsatisfatory since, for instance, 
i/j(A,B) ip(B,A). A different, more fruitful, approach to get a fair generalization 
is axiomatic, that is derive the definition of geometric mean from the properties it 
ought to satisfy. 

A natural property required by a generalization is the following: given a diagonal 
matrix D = diag(<ii, . .. ,d n ), with di > 0, and the identity matrix /, the geometric 
mean is <p(D,I) := diag(v / 3i, ■ ■ ■ , ydn). The aforementioned property is referred as 
consistency with scalars. 

The consistency with scalars is not sufficient to uniquely define a geometric mean. 
We need another property, namely the congruence invariance: let A, B s V n and S 
belonging to the set GL(n) of invertible matrices of size n, then <p(S* AS, S* BS) — 
S*(p(A, B)S. The congruence invariance is mathematically relevant since it states 
that the geometric mean interplay well with the action of GL(n) over V n , that is the 
congruence. Moreover, it allows the geometric mean to model physical quantities. 

The following is a minor variation of a result of Bhatia [TOl Sec. 4.1]. 

Theorem 1.1. Let ip : V n x V n — > V n be a function which verifies both consistency 
with scalars and congruence invariance, then 

<p(A, B) = A 1 / 2 (A~ 1 I 2 BA~ 1/2 ) 1 / 2 A 1 / 2 =: A#B. (1.1) 

The symbol A 1 / 2 stands for the principal square root of the matrix A, which is 
a matrix satisfying the equation X 2 = A and whose eigenvalues have positive real 
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part. Such a matrix exists and is unique if A has no nonpositive real eigenvalues, in 
particular if A is positive then A 1 / 2 is positive. Moreover, for any invertible matrix 
M, it holds that M~ x A l / 2 M = (M~ 1 AM) 1 / 2 (see [2D]). 

It can be proved that A^B verifies all the other properties required by a geometric 
mean, for instance A#B = B#A, and if A and B commute, then A#B = (AB) 1 / 2 . 
Thus, the definition is well established. 

Notice that A#B solves the Riccati equation XA~ 1 X = B and it can be proved 
that it is the unique positive solution (TUJ Thm. 4.1.3]. Moreover, using the properties 
of the principal square root one can derive 

A#B = AiA^Bfl 2 = (BA~ 1 ) 1/2 A = B(B^ 1 A) 1 ^ 2 = (AB~ 1 ) 1 / 2 B. (1.2) 

Yet another important property of the geometric mean can be given in terms of a 
special Riemannian geometry of V n . The geometry is obtained by the scalar product 
(X,Y)a = tr&ce(A~ 1 XA~ 1 Y) on the tangent space T&P n at a positive matrix A 
(which is the set of Hcrmitian matrices). In the resulting Riemannian manifold there 
exists only one geodesic, 7 : [0, 1] — > V n , joining any two positive definite matrices A 
and B and whose explicit expression is known to be [101 Thm. 6.1.6] 

A# t B := 7 (t) = A(A~ 1 B) t = A 1 ' 2 ^ 1 ' 2 BA- 1 ' 2 ) 1 A 1 ' 2 . (1.3) 

It is now apparent that A#B = A#x/ 2 B is the mid-point of the geodesic joining A 
and B. 

The definition A#B — A(B~ 1 A)~ 1 / 2 in terms of an inverse square root yields a 
rather large number of integral representations [26) among which we note the following 
0: 

A#B= l -f (^ + d-t)A-^ dt (L4) 

The relevant applications of the geometric mean of two matrices are reviewed in 
Section 

The contributions of the paper arc of different kind. First of all, we investigate 
some simple theoretical properties of the geometric mean of two matrices, giving a new 
formula for A#B in terms of the polar decomposition and an expression of A#B in 
terms of polynomials in A~ l B and B~ l A which are useful for computational purposes. 
Then, we discuss the sensitivity (in the Euclidean sense) of the matrix geometric 
mean function with respect to perturbations getting upper and lower bounds for the 
condition number. Then, we devote a large part to the old and new algorithms for 
the geometric mean and related quantities like A# t B. 

The existing methods are the averaging technique of Anderson and Trapp [3] , a 
method based on the matrix sign function of Higham et al. [22], the palindromic cyclic 
reduction of Iannazzo and Meini |25j and a method based on a continued fraction 
expansion of Rai'ssouli and Leazizi 32J. We show that the sign method and the 
palindromic cyclic reduction are two variants of the averaging technique. 

We present some further algorithms for the matrix geometric mean: the first one 
is based on the Cholesky factorization and the Schur decomposition and performs with 
great numerical stability in practice; the second is based on the expression of A#B in 
terms of the polar decomposition of certain matrices and is attractive since it relies 
on the small computational cost of the polar factor in terms of arithmetic operations 



(ops); the third is a Gaussian quadrature applied to the integral representation (1.4); 
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while the fourth is based on the rational minimax approximation to the inverse square 
root which is essentially the algorithm of Higham, Hale and Trefethen |18j . 

A perhaps surprising property is that the polar decomposition algorithm and 
the Gaussian quadrature, in their basic definition, produce the same sequence as the 
averaging technique and so they can be seen as yet two more variants of it. Moreover, 
they can be described in terms of certain Pade approximation at xq — 1 of the inverse 
square root. 

The organization of the paper is as follows. In the next section we give a couple of 
properties of the geometric mean which will be useful later. In Section [3] we compute 
the condition number of the matrix geometric mean. In Section [4] we discuss the 
Cholesky-Schur algorithm. In Section[5]we discuss the algorithms related to the Pade 
approximation of z -1 / 2 while in Section [6] we discuss the ones related to its rational 
minimax approximation. In Section [7] we review the applications where a matrix 
geometric mean is required. In Section [8] we perform some numerical tests, while in 
Section!] we draw the conclusions. 

Now, we recall some concept and facts that will be used in the paper. We recall 
that any nonsingular matrix M can be written as HU where H is Hermitian and 
U is unitary; the latter is called the polar factor of M, denoted by polar(M), and 
whose explicit expression is U = M(M*M)~ 1 ' 2 . Given two matrices M and N we 
denote by M®N their Kronecker (tensor) product and by vec(M) the vector obtained 
stacking the columns of M. We speak of vec basis for C nx ™ as the basis in which the 
coordinates of a matrix M are vec(M), similarly the vec basis for C nx ™ x C" x ™ is 

the one in which the coordinates of (M, N) are yecfiV) ' ^ na ^y' ^ /(-^) ^ e a 
matrix function, then for any invertible matrix M, it holds that 

f(MAM- 1 )=Mf(A)M~ 1 ; (1.5) 

we call this property similarity invariance of matrix functions. Beside similarity 
invariance, we use several other properties of general and specific matrix functions, 
for this topic we address the reader to the book of Higham [20] . 

2. Some properties of the geometric mean. Any positive matrix A can be 
written as A = C*C for an invertible C. Two noticeable examples are A = A 1 / 2 A 1 / 2 
and the Cholesky factorization A = R* R, where R is upper triangular with positive 
diagonal entries. 

Given two positive matrices A and B, with factorizations A = C*C and B = D*D, 
the matrix geometric mean of A and B can be characterized using the following result 
which generalizes Proposition 4.1.8 of [ID] . 

Proposition 2.1. Let A = C*C and B = D*D with C,D e C nxn nonsingular. 
Then 

Aj^B = C* polar(CD _1 )£), (2.1) 

where polar(CZ?~ 1 ) is the unitary polar factor ofCD~ l . Moreover, let U be a unitary 
matrix such that C*UD > 0, then C*UD = A#B and U = polar(C_D" 1 ). 

Proof. Using the formula polar(Af) = M(M _1 M - *) 1 ' 2 and the similarity invari- 



ance of the square root (1.5) we get 



C*polar(CT>- 1 )L> = C*CD- x {DC~ l C-*D*) x/ ' i D 

= A{D- 1 DA- l D*D) 1 ' 2 = A(A~ X B) X / 2 = A#B. 
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The second statement can be obtained suitably modifying the proof of Proposition 

4.1.8 of pro]. □ 

Yet another interesting property is obtained using the fact that the principal 
square root of a matrix Z E C nxn i s a polynomial in Z [2U]. In particular, if Z has 
real positive eigenvalues, then Z 1 / 2 — p(Z) where p(z) is the polynomial interpolating 
the points (Ai, \/Ai), . . . , (X s , VX^), where Ai, . . . , X s are the distinct eigenvalues of Z. 

Since A#B = A{A- 1 B) 1 I 2 = B(B~ 1 A) 1 / 2 , we get the following result. 

Proposition 2.2. Let A,B E V n , and let Ai, . . . , A s be the distinct eigenvalues of 
A~ X B, then Af^B = Ap(A~~ 1 B) = Bq(B~ 1 A), where p(z) and q(z) are the interpolat- 
ing polynomials ofthepoints (Ai, ■ . . , (A s , \/A7) and (l/Xi, l/v%"), . . . , (1/A S , 1/v 
respectively. 

Proposition |2.2| has some interesting consequences. First of all we get that if A 
and B are 2x2 matrices then AffB = a A + a\B. An explicit expression of ao and 
ai is well known, in fact [TOl Prop. 4.1.12] 



A#B=—== \ H (a-'A + r'B), a = ydctp), = y/tet(B). 

Similarly, if A and B are such that A~ 1 B has just two eigenvalues then A#B = 
aoA + a\B. 

An application of Proposition |2.2| concerns the preservation of matrix structures 
by the geometric mean. For instance, if Q is an algebra of matrices, then A,BG QO,V n 
implies that Aj^B E Q flV n . An example of Q n V n is the set of circulant Hermitian 
positive definite matrices. 



Proposition 2.2 holds also for A# t B since (A~ 1 B) t as well is a polynomial in 
A~ X B. This fact allows us to prove that the Karcher mean of positive definite matrices 
(see [11] for the definition) preserves Q n V n , where Q is an algebra of matrices. 

Proposition 2.3. Let Q be an algebra of n x n matrices and let A$, . . . ,A m E 
Q nV n , then the Karcher mean G of Aq, . . . , A m belongs to Q PI V n . 

Proof. The sequence S k = S k -i#i/kA{k mod m)+i, with 5i = Ai, converges to 



the Karcher mean [24]. From Proposition 2.2 applied to A^ t B we get that Sk belongs 



to Q n V n for each k, and since Q is closed we have that G E Q. On the other hand, 
by the definition of Karcher mean G E V n , and thus G G Q fl V n . □ 

3. Conditioning. We describe the sensitivity of the matrix geometric mean 
function <p : V n x V n — > V n '■ (A, B) — > AjfcB to perturbations in both its arguments, 
A and B. For any couple of positive matrices (A, B) there exists a neighborhood 
U C C" x ™ x C" xn of it in which the function ip can be extended to a diffcrcntiablc 
function tp with the same formula <p(X,Y) = X(X^ 1 Y) 1 ^ 2 , for X, Y E U. The 
differential (Frcchet derivative) at a point (A, B) is a linear function dip(A,B) '■ C nxn x 

^nxn y ^nxn 

A measure of the sensitivity is given by the relative condition number whose 
expression, following Rice [331 Thm. 4], is 

A(~ (A n\\ W^B)\\\W { A.B)\\ 

cond(( < 9, [A, B)j = - 



MAB)\\ 

where the norm of the couple (^4, B) is the norm of the matrix [A B] and the norm 
of the operator d^^.s) is defined in the usual sense by 

lu ~ „ \\dlp (AtB) [(H,K)]\\ 

H^^) 11 = H,K no^th ZCr o MB~Kj\\ ' 
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To give an explicit expression of the condition number from which deduce suitable 
bounds, we need to compute the differential of <p at a couple (A, B) . ft may be useful 
the following expression of the differential of the matrix mean function (extended in 
a neighborhood of (A, B)). 

Theorem 3.1. Let A,Be V n and H,K G C nxn , and let ip be the extension 
of the matrix mean function in a neighborhood of (A, B) in (£ nxn x C™ xn , then the 
following representation of D = d(p( A ,B) [(H, K)] G C" xn holds 

vec(D) = (I® Z- 1 + Z- 1 ® I)- 1 vcc(iJ) + (I ® Z + Z ® I)' 1 vec(K), 

where Z = (BA- 1 ) 1 / 2 . 

Proof. It is enough to find the "partial" derivative of <p with respect to a pertur- 
bation on B, say K, then interchanging A and B yields the full result. 

Let f(z) = z 1 / 2 , recall that for any matrix with real positive eigenvalues S G C" xn 
and any matrix direction F G C" XI \ it holds that vec(df s [F]) = (I® S 1 / 2 + {S 1 / 2 ) 1, ® 
I)- 1 vec(F) [20J Chap. 6]. 

Since, by the similarity invariance of the square root (1.5 1, <p(X, Y) = X(X~ 1 Y) 1 ' 2 = 
XiX^YX-^X) 1 ! 2 = (YX- x yi 2 X, for any (X,Y) in a neighborhood of (A, B), we 
have by the chain rule 

d$ {A , B) [Q,K] = df BA -x[KA- l ]A 
which in the vec basis can be written as 

vec{d!p {AB) [0, K]) = (A®I)(I®Z + Z T ® iy 1 vcc(Jf A" 1 ) 

= (A® ® Z + Z T ® I)- 1 (A' 1 ® I)vec(K) 
= (I®Z + AZ T A- 1 ® I)" 1 vec(iT) 

where we have used the fact that AZ T A^ 1 = A(A^ 1 B) 1 / 2 A^ 1 = {BA^Y^AA' 1 = 
Z. □ 



Using Theorem [ST] and setting Ml = (J ® + Z^ 1 ® I) -1 and M 2 = {I®Z- 

n for the (relative) cc 

[Mi M 2 ]|| 2 |p S]|| F 



Z ® I) 1 it is possible to get the expression for the (relative) condition number in the 
Euclidean (Frobenius) norm 



cond(y>, (A, B)) 



\A#B\ 



We have used the fact that the operator norm induced by the Euclidean norm coincides 
with the matrix 2-norm (spectral norm) of the matrix representation of the operator 
in the vec basis since 

\\<Kp {AiB) [H,K\\\ F || [Mi M 2 ]vcc([ff K])\\ 2 

sup iruVin = sup ii (\tj yiMi = Ml M2 2 - 

\\[H K]\\ F \\vec[[H K])\\ 2 

The absolute condition number is k(A,B) := ||[Ml ikf 2 ] 1 1 2 ■ 

From the properties of the spectral norm, the following inequalities hold 

max{||Mi|| 2 ,||M 2 || 2 } < ||[Mj M 2 ]|| 2 < (||Mi||i + \\M 2 \\ 2 f' 2 . (3-1) 

To get bounds for the condition number, observe that there exists K such that D = 
K~ X ZK is diagonal and K ® K diagonalizes both Mi and M 2 . Thus, using (3.1 1, we 
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get the bounds for the condition number in the Euclidean norm, which we denote by 
k f (A,B), 

1 -^{p{Z),p(Z- 1 )} ^k f (A,B)^ X -» 2 {K ®K){p{Zf + p{Z- x Y) 112 . (3.2) 

Let U be a unitary matrix which diagonalizes M = B^ 1 / 2 AB^ 1 / 2 , that is U*MU 
is a diagonal matrix, then the matrix V = B~ X I 2 U diagonalizes B~ x A and thus 
diagonalizes Z and Z~ Y . Moreover, p 2 (V) — p, 2 (B~ x l 2 ) = p 2 (B) 1 / 2 . We get an 
upper bound for p, 2 (K ® K) as p 2 (V ® V) = [12(B) and interchanging A and B we 
get a less sharp but better understandable upper bound for the condition number 

K F {A,B) < i min{// 2 (A),// 2 (B)} v /p(S- 1 J 4)+p(A- 1 B). (3.3) 



Given A and B we can possibly reduce the bounds in (3.2 1 by a simple scaling 
of the matrices A and B by positive parameters a and /3, getting the new matrices 
A = aA, B = j3B and Z = ^JJJaZ. From A#B we obtain the required geometric 
mean through A#B = ^((aA)#(/3S)). 



The choices of a and f3 which minimize both y p(Z) 2 + p(Z 1 ) 2 and max{p(Z), p(Z 1 )} 
are such that a/j3 = p(Z) / p(Z~ 1 ) = mM where m and M are the extreme eigenval- 
ues of Z. An approximate value of a//3 can be obtained by the approximations of M 2 
and m 2 got by some steps of the power and inverse power methods applied to B~ x A 
(or A- X B). 

4. An algorithms based on the Schur decomposition. We explain how 
to efficiently compute a point of the geodesic A# t B using the Schur decomposition 
and the Cholesky factorization. The resulting algorithm can be used to compute the 
matrix geometric mean for t = 1/2. 

Consider the Cholesky factorizations A = R* a Ra and B = R* b Rb- Using the 
similarity invariance of the matrix functions we get 

A# t B = A(A~ 1 B) t = R* a R a (R2 1 Ra*BRa 1r aY = R* A {R A * BR-/) 1 R Al (4.1) 

and thus, the evaluation of A# t B can be obtained by forming the Cholesky decompo- 
sition of A, inverting the Cholesky factor R A (whose condition number is the square 
root of the one of A) and computing the t-th power of the positive definite matrix 
V = R A *BR A . This is done by computing the Schur form V = UDU* and getting 

A# t B = R* A U D l U* R A , R A *BR A X = UDU*, (4.2) 

The power of D is computed elementwise. 

If the condition number of A is greater than the one of B, it may be convenient 
to interchange A and B in order to get a possibly more accurate results. Using the 
simple equality A# t B = B#i_ t A, the formula is 

A# t B =B# l _ t A = R* B UD l - t U*R B , R B * AR B X = UDU* . (4.3) 

We synthesize the procedure. 



Algorithm 4.1 (Cholesky- Schur method) Given A and B positive definite ma- 
trices, t £ (0, 1), compute A# t B. 
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1. if the condition number of A is greater than the condition number of B 
interchange A and B, computing B#i_ t A; 

2. compute the Cholesky factorizations A = R* a Ra, B — R* b Rb and form 
V = R^BR^ 1 = X* X where X is the upper triangular matrix solving 
XRb = Ra', 

3. compute the Schur decomposition UDU* = V; 

4. compute A# t B = R* B UD t U*R B . 

The computational cost of the procedure is given by the Cholesky factorizations 
(|n 3 arithmetic operations (ops)), the computation of V (n 3 ops), the Schur decom- 
position (about 9n 3 ops), the computation of R* B UD t U* Rb (3n 3 ops), for a total cost 
of about (14 + |)« 3 ops. 

All the steps of Algorithm 4.1 can be performed in a stable way thus the resulting 
algorithm is numerically stable. 

Remark 4.1. An alternative to compute A# t B is to use directly one of the 
formulae 

A# t B = A(A~ 1 B) t = B(B~ 1 A) 1 ~ t 

= Aexp(ilog(A- 1 B)) = Aexp(-tlog( J B- 1 yl)). ^ 



The expressions in the first row of ( |4.4[ ) can be evaluated either by forming the Schur 
decomposition of the matrix A~ 1 B (or B~ 1 A) which is nonnormal in the generic case 
or using the approximation algorithm of Higham and Lin [3T] . Alternatively one could 



use the expressions in the second row of (4.4) where the exponential and the logarithm 



can be computed as explained in |20j . Unfortunately none of these alternatives is of 
interest since they are more expensive than the Cholesky-Schur algorithm and do not 
exploit the positive definite structure of A, B and A^ t B. 

5. Algorithms based on the Pade approximation of z^ 1 / 2 . We give three 
methods (with variants) for computing the matrix geometric mean, based on matrix 
iterations or a quadrature formula, two of them are apparently new. The algorithms 
are derived using different properties of the matrix geometric mean, however, perhaps 
surprisingly, they give essentially the same sequences which can be also derived using 
certain Pade approximation of z -1 / 2 in the formula A(B^ 1 A)^ 1 ^ 2 . 

The first method is based on the simple property that iterating two means one 
obtains a new mean: the geometric mean is obtained as the limit of an iterated 
arithmetic-harmonic mean. The second is based on the polar decomposition and if 
the robustness is the main concern it is possible to compute it in a backward stable 
way 28, 23 . The latter is based on an integral representation of the matrix geometric 
mean computed with a Gauss-Chebyshev quadrature, that method could be useful if 
one is interested in the computation of (A#B)v, for A and B large and sparse. 

5.1. Scaled averaging iteration. Let a and b be two positive integers, their 
geometric mean Vab can be obtained as the limit of the sequences a k+ i = (a k + b k )/2, 
b k+ i = 2a k b k /(a k + b k ) with a = a and b — b. The updated values a^+i and b^+i 
are the arithmetic and the harmonic mean, respectively, of cik and bk ■ 

This "averaging technique" can be applied also to matrices leading to the first, 
as far as we know, algorithm for computing A^B provided by Anderson, Morley and 
Trapp [3 and based on the coupled iterations 

A = A, B = B, 

A k+1 = (A k +B k )/2, fc = 0,l,2,..., (5.1) 

B k+1 = 2A k (A k + B k )- l B k = 2{A' k 1 + B^)~\ 
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where A k and B k , for k = 1, 2, . . ., both converge to Aj^B. Observe that Ak+i is the 
arithmetic mean of A k and while B k+ i is the harmonic mean of A k and P^. 

The convergence is monotonic in fact it can be proved that A k > A k+ i ^ A#B ^ 
Bk+i ^ Bfc for k = 1, 2, . . . (see (D), where we say that Pi ^ P 2 if Pi — P2 is 
semidefinite positive. 

The sequences Ak and P& are related by the simple formulae Ak = AB k ~ 1 B = 
BB^ 1 A (or equivalently Bk = AAZ B = BA~ k 1 A), which are trivial for k = and, 
assuming them true for k, then, the equality B k \ = (A^ 1 + B k ~ 1 )/2 yields 

A k+ i = \{A k + Bk) = \{ABl l B + AA-'B) = AB^B, 



1 



(BB- 1 A + BA- 1 A)=BB-l 1 A, 



hence, the formulae are proved by an induction argument. 

Using the previous relationships, iteration (5.1) can be uncoupled obtaining the 
single iterations 



A = A(orB), A k+1 = ^(Ak + AA-'B), k = 0,1,2, 



and 



(5.2) 



B =A (or B), B k+1 = 2(B k - 1 +B- 1 B k A- 1 )-\ k = 0,1,2, 



(5.3) 



Iteration (5.2) has the same computational cost as (5.1), and seem to be more at- 
tractive from a computational point of view since requires less storage. However, 
iterations (5.2) and (5.3) are prone to numerical instability than (5.1) as we will show 
in Section [S 

Yet another elegant way to write the averaging iteration is obtained observing 

that 

B k+1 = 2A k (Ak+B k )- 1 (Ak+Bk~Ak) = 2A k -2A k (A k +B k )- 1 A k = 2A k -A k A~l x A k , 
which yields the three-terms recurrence 



Ao = A, A± = (A + B)/2, 

A k+ 2 = \{A k+1 + 2A k - AkA-^Ak), ~ °' 2 ' 



(5.4) 



Essentially, the same algorithm is obtained applying Newton's method for com- 
puting the sign of a matrix in the following equality proved by Higham et al. [22] : 



sign(C) = 



A#B 
(A#P)"! 



C := 



B 

A- 1 



(5.5) 



The sign of a matrix M having nonimaginary eigenvalues can be defined as the limit 
of the iteration Mq = M, M k+ \ = (M k + M k ~ 1 )/2. Applying the latter iteration to 

X k 



the matrix C of (5.5) yields a sequence Ck 



Y k 



and the coupled iterations 



X Q = B, Y =A-\ 
Xk+^iXk+Y- 1 )^, fc = l,2,. 
Y k+1 = (Y k +X^)/2, 



(5.6) 
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where X k converges to A#B and Is, converges to (A#B)~ 



We prove by induction that the sequences (5.1 ) and (5.6) are such that X k 
-i r„.i._-, o T „^„ iV _ (b + A)/2 = Ax, Yl = (A- 1 +B- 1 )/2 = 



Y k = B7\ for k = 1, 2, . . . In fact Xi 



while A fe+1 = (X k + F- 1 )/2 = (A, 



B fc )/2 = A fc+1 and F fc+1 = (Y fc + X" 1 )^ 



Ah, 

-i 



(V+V)/2 = B fe -i. 

Iteration (5.1 1 based on averaging can be implemented at the cost per step of three 



inversion of positive matrices, that is 3n 3 ops, while iteration (5.6) based on the sign 
function can be implemented at a cost of 2n 3 ops. Moreover, the scaling technique for 
the sign function allows one to accelerate the convergence. Let M be a matrix such 
that the sign is well defined, from sign(M) =sign(7Af ) for each 7 > 0, one obtains the 
scaled sign iteration which is M — JqM, M/.+1 = (jkMk + (ikMk) _1 )/2, where 74, is 
a suitable positive number which possibly reduces the number of steps needed for the 
required accuracy. A common choice is the determinantal scaling 7^ = | det(Mfc)| -1 ' n 
[14j . a quantity that can be computed in an inexpensive way during the inversion of 

M k . Another possibility is to use the spectral scaling 7*, 



which is interesting in our case since the eigenvalues of C 



p{M^)/p{M k ) [17], 

B 

A- 1 



are all real 



and simple (in fact C 2 



BA~ 






A- X B 



has only real positive simple eigenvalues) 



and in this case a theorem of Barraud [9, 20 j guarantees the convergence to the exact 
value of the sign in a number of steps equal to the number of distinct eigenvalues of 
the matrix. 

To get the proper values of the scaling parameters it is enough to observe that 
I det(Cfc)| = I det(Xfc) dct(Yfc)| and thus for the determinantal scaling 7*, = | det(Xfe) det(Ffc) 
while p{C k ) = \/ p(X k Yk) and thus for the spectral scaling j k = V ' P^kYk)' 1 ) I 'p(X k Y k ). 

A scaled sign iteration is thus obtained. 



-1/(2") 



Algorithm 5.1a (Scaled averaging iteration: sign based) Given A and B pos- 
itive definite matrices. The matrix A^B is the limit of the matrix iteration 



X =B, Y = A- 1 . / 

Ik = y/p((X k Y k )- l )/p(X k Y k ) (or lk = I det(X fe ) det(y fc )|- 1 /(2»)) 
X k+1 = ( lk X k + (jkY k )- r )/2, 
Y k+1 = { lk Y k + ( 7fc A fc )- 1 )/2, 



k = 0,1,2, 



(5.7) 



Using the aforementioned connections between the sign iterates and the averaging 
algorithm the scaling can be applied to the latter obtaining the following three-terms 
scaled algorithm. 



Algorithm 5.1b (Scaled averaging iteration: three-terms) Given A and B 
positive definite matrices. The matrix A$=B is the limit of the matrix iteration 



lk = 
A = 

A k +2 



det(^ fc ) 2 



det(A) det(.B) 
A, ., = | 

7fc+2 



-l/(2n) 



^ (70A + B/70), 

(A k+1 + 2A k / lk+1 - AkA^Ak), 



k = 0,1,2. 



(5.8) 
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The same sequence is obtained considering the Palindromic Cyclic Reduction 
(PCR) 

P = l(A-B), Qo = \(A + B), 

Pk+i = -PkQk'Pk, k = 0, 1, 2, . . . (5.9) 

Qfe+i = Qk - 2P k Q k ~ 1 P k , 

whose limits are lim^ Q k = A^B and lim^ P k — 0. This convergence result is rooted 
on the fact the matrix Laurent polynomial 

C(z) = \{A- 1 B-^z- 1 + \{A-' + B- 1 ) + \{A~^ - B~ 1 )z, 

is invertible in an annulus containing the unit circle and the sequence Q k of the PCR 
converges to the central coefficient of its inverse, namely A#B |26j . 

Since the PCR verifies the same three-terms recurrence ( |5.8[ ) as the averaging 
iteration [25] . one obtains that Qk = Ak+i and thus Pk = (Ak — B k )/4. 

The connection with PCR is useful because allows one to describe more precisely 
the quadratic convergence of the averaging technique, as stated by the following the- 
orem of Iannazzo and Meini [55] . 

Theorem 5.1. Let A and B be positive definite matrices, then the PCR sequence 



Qk of (5.9 1 (and thus the sequence Ak obtained by the averaging iteration ( |5.1[ ) ) 
converges to Aj^-B and \\Qk — A#B\\ = 0(£ 2 ), where £ is any real number such that 
p 2 < £ < 1 with, p — cr/(l + \/l - a 2 ), where a = max Aeo .( A -i B ){|(A - 1)/(A + 1)|}. 

5.2. Pade approximants to z -1 / 2 . We give another interpretation of the se- 
quences obtained by the averaging technique in terms of the Pade appoximants of 
the function z~ x / 2 . To this end, we manipulate the sequence Ak of (5.1) showing its 



connection with Newton's method for the matrix square root and with the matrix 
sign iteration. 

Let S = A~ 1 B and consider the (simplified) Newton method for the square root 
of S, namely 



A = I, A k +i = \{A k + A^S). (5.10) 

The sequence Ak converges to S 1 / 2 for any A and B, since the eigenvalues of S are 
real and positive [20l Thm. 6.9]. We claim that A k — A~ x Ak, where A k is one of 
the two sequences obtained by the averaging iteration. To prove this fact, a simple 
induction is sufficient, in fact assuming that Ak — AAk, we have 

AA k+1 - l -(AA k + AA-'S) = \(A k + AA^AA^B) = A k+1 , 



in virtue of (5.2 1 



It is well known that Newton's method for the square root of the matrix S (5.101 
is related to the matrix sign iteration 



1 = 1 -(z k + z k - 1 ), z = s-v\ 



through the equality Z k = S 1 /' ' A k [50], and thus we have that 

A k - AA k - AS 1 ' 2 Z k = (A#B)Z k . (5.11) 
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The latter relation allows one to relate the averaging iteration to the Pade ap- 
proximants to the function i -1 / 2 in a neighborhood of 1. We use the reciprocal Pade 
iteration functions defined in |17) as 

/ \ Qn,m(l Z ) 

ta+l(Z)= ^, m (l^ 2 )' 

where P n ,m(0 /Qn,m(0 is the (n, m) Pade approximant to (1 — £) -1 / 2 at the point 0, 
that is 

as £ tends to and P n ,m and Q n ,m ar e polynomials of degree n and m, respectively. 

We define the principal reciprocal Pade iteration for m = n + 1 and m = n as 
g r (z) := 7j m+n+ i{z) — (p 2m ,2n+i{z), for which we prove the following composition 
property. 

Lemma 5.2. Let r,s be positive integers. If r is even then g rs (z) = g r (Jj s (z)), if 
r is odd then g rs (z) = g r (|^yy) ■ 

Proof. The principal reciprocal Pade iterations are the reciprocal of the well- 
known principal Pade iterations, namely 

9k{z) = gW) = (! + ,)*-(!-,)* (5 ' 12) 

where the latter equality follows from the explicit expression of g k {z) given in [201 
Thm. 5.9]. Notice that if r is even, then g r (l/ z) = g r (z), moreover, g rs (z) — g r (g s (z)) 
(in fact it is easy to see that the principal Pade iterations are conjugated to the powers 
through the Cayley transform C(z) = (1 — z)/(l + z), that is g r (z) = C(C(z) r )), and 
thus 

9r{g s {z)) = \ = , 1 , „ = — ^j-t = g rs (z), 

while if r is odd, then g r (l/z) = 1/ g r {z) and we get g rs (z) — g r ^ ~ ^ ^ . □ 



We are ready to state the main result of the section where we use g 2 (z) 



2z 



Theorem 5.3. Let P k (z)/Q k {z) be the [2 k ~ 1 ,2 k ~ 1 - 1] Pade approximant at 
to the function (1-z)^ 1 / 2 , with k>0, then A k = AQ k {I - A- l B)P k (I - A^B)' 1 . 

Proof. Let Z Q = {A- 1 B)- 1 / 2 . We prove that Z k = g 2 k(Z Q ) = tp 2 k 2 k_ 1 (Z Q ), this 
is true for k — 1, in fact Z\ — ?j 2 (Zo), while to prove the inductive step we use Lemma 



5.2 so that g 2 k+i(Z ) = g 2 (g 2 k (Z )) = g 2 {Z k ) = Z k+1 . 



Equation (|5.12| gives g 2 k (z) = g 2 k (l/z) and then g 2 k (Z ) = g 2 k (Z x ) = f 2 k y2 k_ 1 (Z Q 



Thus, in view of equation (5.11 1 and recalling that A#B — AZ , we have 



A k — AZ 1 Z k 



AZ 1 Z Q 2 k-i j2 fc-i_i(/ — Z 2 )P 2 k-i t2 k-i_ 1 (I — Z 2 ) 1 

= AQ k (I-A- 1 B)P k (I-A- 1 By 



□ 



12 



Bruno Iannazzo 



As a byproduct of the previous analysis we get that the Newton method for the 
scalar square root is related to the Pade approximation of the square root function. 
Corollary 5.4. Let z e C \ (-00, 0], and let 

!/ -u 

z k+l = 2^ Zk+ ZZ k )' Z ° = Z > 

be the Newton iteration for the square root of z, then zj, = where p(z)/q(z) is 

the [2 k ~ 1 , 2 k ~ 1 — 1] Pade approximant at 1 of the square root function z 1 / 2 . 

Remark 5.5. Rai'ssouli and Leazizi propose in [32] an algorithm for the ma- 
trix geometric mean which is based on a matrix version of the continuous fraction 
expansion for scalars a, b > 0, 



V ab 



fe=i 



The partial convergent ijv 



a+b . 

2 ' a+b 



k=l 



is proved to be 



tN 



ab- 



ab) 



2N+2 



+ (1 - Vab) 



.2N+2 



ab) 



2N+2 



(1 - Vab) 



,2N+2 ' 



thus from the expression for the Pade approximation in (5.12), and the character 



ization of the averaging iteration in terms of the Pade approximation we get that 
Ak = t 2 k-2_ 1 , for k ^ 2, where Ak is one of the sequences obtained by the averaging 
iteration with Aq = a and Bq — b. 

The same equivalence holds in the matrix case, so we get that the sequence tj^ 
converges linearly to the matrix geometric mean with a cost similar to the averaging 
iteration which indeed converges quadratically and moreover can be scaled. Thus, the 
sequence t]\! is of little computational interest. 

5.3. Algorithms based on the polar decomposition. Let A = R* a Ra and 
B = R* B Rg be the Cholesky factorizations of A and B, respectively. Using these 
factorization in formula (2.1) we obtain the following representations for the matrix 
geometric mean 

A#B = i?4polar(i? Aj R B 1 )i? B ,= R* B polar(i? B i? A 1 )i? A = R* A polar(i? B ^ A 1 )*i?s, 

(5-13) 

where we have used the symmetry of the matrix A#B and the commutativity of the 
matrix geometric mean function. 



We derive from (5.13) an algorithm for computing the matrix geometric mean. 



Algorithm 5.2 (Polar decomposition) Given A and B positive definite matrices 
with fi(A) ^ (i(B). 

1. Compute the Cholesky factorizations A = R* a Ra and B = R* b Rb] 

2. Compute the unitary polar factor U of RbRI , 

3. Compute A#B = R* B UR A . 



The polar factor of a matrix M can be computed forming its singular value 
decomposition, say M = from which we get the polar factor of M as Q\Q2- 

This procedure is suitable for an accurate computation due to the good numerical 
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property of the SVD algorithm, but it is expensive with respect to a method based 
on matrix iterations. 

A more viable way to compute the unitary polar factor of M is to use the scaled 
Newton method 

Zk+i = 2 ' Z ° = M > I 5 - 14 ) 

where 7^ > can be chosen in order to reduce the number of steps needed for 
convergence. 

A nice property of the scaled Newton method for the unitary polar factor of a 
matrix is that the number of steps can be predicted in advance for a certain machine 
precision and the algorithm is backward stable if the inversion is performed in a 
mixed backward/ forward way (see [551 An alternative is to compute the polar 

decomposition using a scaled Halley iteration as in |23j . 

The better choice for the scaling factor in the Newton's iteration is the optimal 
scaling 7^. = (cr 1 (Xfc)cr I j(A"fc)) 1 / 2 , where (Ti(X/ c ) and a n {X k ) are the extreme singular 
values of X k ■ In practice cheaper approximations of the optimal scaling are available 
[HI Sec. 8.6]. 



If 7^ = 1 for each fc, then the sequence Z k obtained by iteration (5.141 with 



Zq = RbRa 1 1S strictly related to the sequence obtained by the averaging technique, 



in fact Zk = R B * AkR^ 1 , where A k is defined in ( |5.2| with Aq — B. This equality 
can be proved by an induction argum 
and if the equality is true for k, then 



can be proved by an induction argument in fact R B * AqR^ = R B * RgRgR^ = Z 



'<<<■■■ i = 7;( z k + Z k *) - R B * (-£- + R* B 3 t ~Ra ) Ra 1 



- r r* ( — ~ ^ ~ — I Ra 1 - r b* ( k + o k — ) Ra 1 - R B * A k+iR A 1 



2 K k ' a \ 2 

\r k 1 A\ 

The equality RqZ^Ra — A^, and the monotonicity of Ak proves that the approxi- 
mated value of A#B is greater than or equal to Ajf^B in the order of V n - 

Remark 5.6. Notice that for A — I, Algorithm 5.2 reduces to the algorithm of 
Higham [2U1 Alg. 6.21] for the square root of a positive matrix B. A side-result of the 
previous discussion is that Higham's algorithm can be seen as yet another variant of 
the Newton method for the matrix square root. 

5.4. Gaussian quadrature. A third algorithm is obtained using the integral 



representation (1.4) obtained by Ando, Li and Mathias [5 j using an Euler integral, the 
same representation is obtained by Iannazzo and Meini [26j from the Cauchy integral 
formula for the function z -1 / 2 . 

The change of variable z = yields 

A#B.lt ((W' + u^m- 1 )-^ ( , 15) 

7T J-1 VI - Z 2 

which is well suited for Gaussian quadrature with respect to the weight function 
oj(z) = (1 — z 2 ) -1 / 2 , referred as Gauss-Chebyshev quadrature since the orthogonal 
polynomials with respect to the weight lu(z) are the Chebyshev polynomials (see |15) 
for more details) . For an integral of the form 

1 VT^ 2 
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where / is a suitable function, the formula is 



k=0 



Applying the Gauss-Chebyshev quadrature formula to (5.15) we obtain the fol- 
lowing approximation of A#B 



2 N 



T N+1 (A, B) = g((l + x^B- 1 + (1 - x^A- 1 )- 1 

(5.16) 

.4. 



fc=0 



Algorithm 5.3 (Gauss-Chebyshev quadrature) Given A and B positive definite 
matrices. Choose N and set 

A#BnT N (A,B). 



where Tm{A,B) is defined in (5.16) 



The computation cost is the inversion of a positive matrix, that is n 3 ops, for 
each node of the quadrature and two matrix multiplication at the end. The number of 
nodes required to get a fixed accuracy depends on the regularity of the function ip( z ) = 
((l + z)A+(l — z)B)~ 1 . The function ip(z) is rational and thus analytic in the complex 
plane except the values of z such that ip(z) is singular, which are the reciprocal of the 
nonzero eigenvalues of the matrix (B - A)(B + Ay 1 = (A~ X B - I)(A^ 1 B + I) -1 . 

We claim that all the poles are real and lie outside the interval [—1,1], which 
is equivalent to require that the eigenvalues of (A~ 1 B — I)(A~ 1 B + I)^ 1 lie in the 
interval (—1,1). Define C(z) = (z — l)/(z + 1), then the image under C(z) of the 
positive real numbers is the interval (—1, 1), then the eigenvalues of C(A^ 1 B) lie in 
the interval (—1, 1) since A~ 1 B has positive eigenvalues Ax, . . . , A„ and the eigenvalues 
of CiA^B) are c{\ x ), . . . ,C(A„) (compare [H Thm. 1.13]). 

Standard results on the convergence of the Gauss-Chebyshev quadrature (see [131 
Thm. 3]) imply that the sequence TV (A, B) converges to A#B linearly, in particular 
for each p 2 < £ < 1, it holds that ||TV(A= B ) - A # B \\ = 0{i N ), where 1/p is the sum 
of the semiaxes of an ellipse with foci in 1 and —1 and whose internal part is fully 
contained in the region of analiticity of ip(z). 

Since the poles of ip(z) are real and lie outside the interval (—1,1), then the 

largest ellipse is obtained for p — l/(^ + y^r — 1) = <r/(l + VT— cr 2 ), where a = 
max{|C(A i )|} (notice that \ja is the pole of ip(z) nearest to [—1, 1]). 

If m and M are the smallest and largest, respectively, eigenvalues of A _1 i?, then 
the convergence of TV (A, B) is slow if m is small or M is large. By a suitable scaling 
of A, it is possible to have mM = 1, which gives a faster convergence, however, when 
M/m tends to infinity the parameter of linear convergence tends to 1, in this case 

a simple analysis shows that p = 1 + O [\p^j and thus the parameter of linear 

convergence of T/v(A, B) depends linearly on M/m. In Section [6] we give another 
quadrature formula whose dependence on M/m is just logarithmic. 
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A comparison of the parameters of linear convergence for the Gauss-Chebyshev 
formula Tj^{A,B) and the parameters of quadratic convergence for the averaging 
iteration in Theorem |5.1| reveals that they are essentially the same. This is not a 
mere coincidence, in view of the following result. 

Theorem 5.7. LetT k be the quadrature formula of (5.16) and B k be the sequence 
obtained by the averaging technique (5.1 1 then Bk = T 2 k-i, for k = 1, 2, . . . 

Proof. Let S = A~ X B, assume that S — I is invertible, then 

N 
2 

fc=0 



B^T, 



N 



^((l+^K+a-xfe^r 1 

fc=0 
N-l 

v ^2(I + S + x k (I-S))-\ 



N-l 



fe=0 



where JC(z) = (1 + z)/(l — z). Let 7~n(x) be the Nth Chebyshev polynomial, then 



e^o 1 ^ + tr 1 = n(t)/r N (t), thus 



B- l T N = -(I-S)- l U(1C(S))T N (1C(S))- 1 . 

To conclude the proof, since T\ — B\ by direct inspection, it is enough to prove by 
induction that T 2 k — 2(T 2 ~ k 1 _ 1 + A~ 1 T 2 k-iB~ 1 )~ l which is equivalent to prove that 
2(B~ 1 T 2k y 1 = (B- 1 T 2h -i)- 1 +SB- 1 T 2h -i (T 2 k is invertible since the zeros of T 2 ' fc (t) 
lie in (—1, 1)). Observe that T 2h (t) = ATik-i 7%i=-i, and thus 



■>ft-2 



(I - S)T' 2k -i (/C^))" 1 ^-! (JCiS))- 1 ^ (K(S)). 



2(B- 1 T 2k y 1 
On the other hand 

(_B T^fe— i) -\- SB T^fe-i 
= 2 k - 2 (I-S)T' 2k -i (ICiS))- 1 ^-! {K,{S))+2 2 - k S{I-S)- 1 T' 2k -i{JC(S))T 2k -i (IC(S))- 1 

After some manipulations, to conclude it is enough to prove that 

T 2 . {JC(S)) = T 2 h-x (IC(S)) 2 + 2 4 - 2k S(I - 5)- 2 T 2 ' fc - 1 (IC(S)) 2 , (5.17) 

this property is a special case of a more general identity involving Chebyshev polyno- 
mials, in fact for each k, it holds that 



%{K,(z)f 



Az 

fc 2 (z-l) 2 



Tl(lC{z)Y 



1 



(5.18) 



for a complex variable z. By the change of variable x = IC(z) formula (5.18) is 

X 2 - 1 2 

equivalent to T k {x) 2 — — — T k (x) + 1 which can be proved directly using Tk{x) — 



k 2 



cos(fcarccosa;). The equation (5.18) implies (5.17) in fact T 2 k = 2T 2k - x — 1. 

If S — I is singular, then (3A~ 1 B — I is invertible in a neighborhood of j3 = 1 
except (3 — 1, thus T 2 k-i(A, f3B) — B k (A, f3B), which gives the desired equality as /3 
tends to 1. □ 
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6. Algorithms based on the rational minimax approximation of z 1//2 . 
In Section [5] we have found that many algorithms for computing the matrix geometric 
mean are variations of the one obtained by using certain Pade approximations of z~ x / 2 
in the formula A#B — A(B^ 1 A)^ 1 / 2 . To get something really different, one should 
change the rational approximation. The natural direction is towards the (relative) 
rational minimax approximation. 

Let Rk-i.k be the set of rational functions whose numerator and denominator 
have degree k — 1 and k, respectively. The function rfe, 7 (z) is said to be the rational 
relative minimax approximation to z- 1 ' 2 in the interval [1,7] if it minimizes over 
Rk-i,k the quantity 



max 

*e[i,7] 



iz) 



_Z-l/2 



j-1/2 



An explicit expression for rfc i7 (z), in terms of elliptic function is known since the work 
of Zolotarev in 1877 (see [54]). 

The same approximation is obtained by Hale, Higham and Trefethen [TH| by a 
trapezoidal quadrature following a clever sequence of substitutions applied to the 
Cauchy integral formula for A -1 / 2 , namely, 



A -i/2 



1 

2tti 



z- 1/2 {zl - A)' 1 dz. 



Since A#B = A{A- 1 B) 1 I 2 = B(A- 1 B)- 1 ^ 2 1 using the results of [IB], we get the 
following approximation (obtained by a quadrature formula on N nodes on a suitable 
integral representation of Aj^B) 



S N {A,B) = B 



-2K'Jm 



JY 



nN 



^M^A-SrV^dnfe) ] A ((S.l.i 



which is proved to coincide with Arpf^iB 1 A) for 7 = M/rn, where M and m are 
the largest and the smallest eigenvalues of A _1 B, respectively. 



The notation of (6.1) has the following meaning: 



f,- = (j-l/2)^i, l^j^N, 

w(tj) = \/msTi{tj\^), where sn(tj\^j), cn(t,|7) and dn(tj|7) are the Jacobi elliptic 
functions, while K' is the complete elliptic integral of the second kind associated with 
yJ^f (see [T] for an introduction to elliptic functions and integrals). 

The convergence of Sn(A, B) to A#B can be deduced from Theorem 4.1 of [18]. 
In particular, 

\\A#B - S N (A,B)\\ = 0( e -2 7 r 2 JV/(log(M/m)+3) ) _ 

Thus, the convergence of the sequence Sn(A, B) to A#B is dominated by a sequence 
whose convergence is linear with a rate which tends to 1 as M/m tends to 00, but 
whose dependence on M/m is just logarithmic. On the contrary, the rate of linear 
convergence of the Gauss-Chebyshev sequence Tjy(A,B) of (5.161 depends linearly 



on M/m, and thus we expect that the formula Sn(A,B) requires less nodes than 
Tn(A,B) to get the same accuracy on the approximation of Aj^B at least for large 
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Fig. 7.1. An electric circuit whose joint resistance is related to the matrix geometric mean 

values M/m. In practice, the approximation obtained from Tn(A, B) is always better 
than Sn{A, B) as suggested by our numerical tests of Section [8j 
We describe the synthetic algorithm. 

Algorithm 6.1 (Rational minimax) Given A and B positive definite matrices. 
Choose N and set 

A#B « S N , 



where Sjv is defined in (6.1) 



7. Applications. We review some of the applications in which the geometric 
mean of two matrices is required, they range from electrical network analysis [3J to 
medical imaging [7], from norm on fractional Sobolev spaces [B] to image deblurring 
[IB] , to the computation of the geometric mean of several matrices [5J [13], with indirect 
applications to radar [5] and elasticity [55]. 

7.1. Electrical networks. Fundamental elements of a circuit are the resistances 
which can be modeled by positive real numbers. It is a customary high school argu- 
ment that two consecutive resistances r\ and ri in the same line can be modeled by 
a unique joint resistance whose value is the sum r\ + r 2 , while if the two resistances 
lie in two parallel lines their joint resistance is the "parallel sum" (rj -1 + r^ 1 )^ 1 ■ 

More sophisticated devices based on resistances are n-port networks, which are 
"objects" with 2n ports at which current and voltage can be measured, without know- 
ing what happens inside. The usual way to model n-port networks is through positive 
definite matrices. In this way two consecutive n-ports A and B can be modeled as the 
joint n-port A + B, while two parallel n-ports give the joint n-port {A^ 1 + B -1 ) -1 . 

Complicated circuits, made of several n-ports can be reduced to a joint n-port 
using these sums and parallel sums. Consider the circuit in Figure [7TT] it is an infinite 
network (which models a large finite network). 

Let Z k be the joint resistance of the subcircuit obtained selecting the first k loops, 
then it can be shown that Z\ — B and 

Z k+1 = (B- 1 + (A + Z k )- 1 )-\ 

and the sequence has limit lim/^oo Z k = |(— A + (A#(A + 4B))). This limit is 
the joint resistance of the infinite circuit. For further details see [2], from which the 
example is taken. 

It is worth pointing out that the definition of geometric mean of two matrices 
first appeared in connection with these kind of applications [31) . 

7.2. Diffusion tensor imaging. The technique of Nuclear Magnetic Resonance 
(NMR) in medicine produces images of some internal parts of the body which are used 
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by medics to give a diagnose of important pathologies or to decide how to perform a 
surgery. 

One of the quantities measured by the NMR is the diffusion tensor which is a 
3x3 positive matrix describing the diffusion of the water in tissues like the white 
matter of the brain or the prostate. The technique is called Diffusion Tensor Imaging 
(DTI). 

The diffusion tensor is measured for any of the points of an ideal grid into the 
tissue, thus one has a certain number of positive matrices indexed by their positions. 

A problem in DTI is the "interpolation" of tensors, that is, given two tensors, 
find one or more tensors in the line joining them, the more adherent to the real data 
as possible. This is useful for instance to increase the resolution of an image or to 
reconstruct some corrupted parts. 

Many models have been given for the interpolation of tensors in DTI, the most 
obvious of which is the linear interpolation, where k points between A and B are 
= fc+T^ + fc fc+7"' B, for j = 1, . . . , k. The linear interpolation finds point equally 
spaced on the line joining A and B in the space of the matrices, that is, uses the 
Euclidean geometry of C nxn . 

Some more adequate models use Riemannian geometries. Using the geometry 
given in Section [T] we get the interpolation points 

Pu = A# 3/[k+1) B = A(A- 1 By^ k+1 \ j = 1, . . . k. 

Using the log Euclidean geometry defined in [7] , we get the interpolation points 

H = exp (-^ log(A) + k+ k l +l ] log(S)) , j = 1, . • • k. 

The log Euclidean geometry has been introduced as an approximation to the Rie- 
mannian geometry where quantities are easier to be computed. However, in the 
interpolation problem described here, using the Cholesky-Schur algorithm of Section 
[4] to compute P% (reusing the Schur factorization of RjfBRj^ for each j) is much 
less expensive than the computation of P k using the customary algorithms for the 
logarithm and the exponential of a matrix. 

7.3. Computing means of more than two matrices. The generalization of 
the geometric mean to more than two positive matrices is usually identified with their 
Karcher mean in the geometry given in Section [I] (see |llj for a precise definition). 

The Karcher mean of Ax, . . . , A m can been obtained as the limit of the sequence 
Sk = Sk-i#i/kA(k mod m)+i) with Si = A\ as proved by Holbrook [24] . The resulting 
sequence is very slow and cannot be used to design an efficient algorithm for the 
computation of the Karcher mean, however it may be useful to construct an initial 
value for some other iterative methods like the Richardson-like iteration of Bini and 
Iannazzo |llj . 

Other geometric-like means of more than two matrices are based on recursive 
definitions like the mean proposed by Ando, Li and Mathias [5], which for three 
matrices A , B Q and Co is defined as the common limit of the sequences 

Ak+i = Bk#Ck, Bk+i = Ck#Ak, Ck+i — Ak#Bk, k = 0,1,2,... 

These sequences converge linearly to their limit. Another similar definition which 
gives cubic convergence (to a different limit) has been proposed in |T3l [30]., who 
propose the iteration 

Ak+i = A k # 2 /3{Bk#Ck), B k+ i = B k # 2 /3(Ck#A k ), C k+ i = C k #2/3{A k #B k ). 
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As one can see, the efficient computation of A# t B is a basic step to implement these 
kind of iterations. 

7.4. Image deblurring. A classical problem in image processing is the image 
deblurring which consists in finding a clear image from a blurred one. In the classical 
models, the true and the blurred images are vectors and the blurring operator A is 
linear, thus the problem is reduced to the linear system Af — g which in practice 
is very large and ill-conditioned. A computationally easy case is the one in which A 
is a band Toeplitz matrix, which corresponds to the so-called shift-invariant blurred 
operators. 

Even if A is not shift-invariant, it can be possible, in certain cases, that a change 
of coordinate M makes it shift-invariant, i.e. M T AM is band Toeplitz. If such a M 
exists and is known, then the linear system Af = g has the same nice computational 
properties as a band Toeplitz system. 

When A and T are positive definite, the matrix M — A _1 #T is an explicit change 
of coordinates. For further details see [US] . 

7.5. Discrete interpolation norm. The material of this section is taken from 
[B] to which we address the reader for a full detailed description. 

Let f2 C R™ be open, bounded and with smooth boundary, and let iJg (SI) be the 
Sobolev space of differentiable functions on L 2 (tt) with zero trace, while Hq(£1) be 
the set of functions on L 2 (Jl) with zero trace. 

Let {(pi, . . . , f n } be a set of linearly independent piecewise linear polynomials on 
a suitable subdivision of fl (arising, for instance, from a finite elements method), then 
the span in Hq (resp. Hq) of {tpi}i=i,...,n, is an Hilbert subspace Xh (resp. Yj,). 

Define the matrices Lq and L\ such that 

{Lo)ij = ((Pi,<Pj)L 2 (n), ( L i)ij = (V<pi,V<pj) £ 2(n). 

The matrices Lq and L± are positive definite since they are Grammians with respect 
to a scalar product, in particular Lq is a discrete identity and L± is a discrete Dirichlet 
Laplacian. A norm for the interpolation space [Xh,Yh]s is given by the energy norm 
of the matrix 

Lq{L 1 Lq) 1 ® = Lq#x-$Lx. 

The most interesting case is d — 1/2, where the norm is given by the geometric mean 
of Lq and L\. 

A similar construction can be used to generate norm of interpolation spaces be- 
tween finite dimensional subspaces of generic Sobolev spaces with applications to 
preconditioners of the Stenkov-Poincare operator or boundary preconditioners for 
the biharmonic operator. 

8. Numerical Experiments. We present some numerical tests to illustrate the 
behavior in finite precision arithmetic of the algorithms presented in the paper. The 
tests have been performed using GNU Octave 3.2.3 on a 2008 Laptop. The scripts 
of the tests are available at the author's personal web page, so that any numerical 
experiment can be easily replicated by the reader. The implementations are not 
efficient, but they are made just to test the behavior of the algorithms. Regarding 
Algorithm 6.1, based on the rational minimax approximation, we have used the code 
of |18j . The implementation of the best algorithms for the matrix geometric mean 
can be found also in the Matrix Means Toolbox [T2"] . 
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As a measure of accuracy we consider the relative error \\G — G||/||G||, where G is 
the computed value of the geometric mean, while G is the exact (up to the roundoff) 
solution obtained by a direct formula. 

Test 8.1. We want to compare the behavior of the algorithms showing how 
the convergence of the iterative algorithms and quadrature formulae depends on the 
quotient M/m, where M and m are the largest and the smallest eigenvalues of A~ 1 B. 

We consider, for x > 1/2, the matrices 



A = 



2 1 
1 2 



B = 



x 1 
1 2 



whose corresponding geometric mean and A 1 B are 



A#B = 



1(1 



y/6x 
1 



3) 



A B = 



\{2x-l) 
|(2 -x) 



For x > 2, we have M/m — |(2x — 1). 

For x = 10 and x — 1000, we compute an approximation of A#B in double 
precision using the different algorithms and monitor the relative error at each step for 
the iterations and for an increasing number of nodes for quadrature rules. The results 
are drawn in Figure [8~Tj where the algorithm considered are: the Averaging algorithm 



(AV), namely iteration (5.1|; the Averaging iteration with spectral scaling (AVs), 
namely Algorithm 5.1a; the polar decomposition algorithm (PD), namely Algorithm 
5.2, where the polar factor is computed by Newton's method with spectral scaling; 
the rational minimax approximation algorithm (MM), namely Algorithm 6.1; and the 
Gauss-Chebyshev quadrature (GC), namely Algorithm 5.3. 

As one can see, the convergence of iterations and quadrature formulae is strictly 
related to the quotient M/m as the analysis suggests. Both quadrature rules show 
linear convergence, but the one based on rational minimax is much more effective. 
Regarding the scalings of the averaging/sign iteration, the fast convergence of the 
spectral scaling fits the fact that this case is made of 2 x 2 matrices and hence two 
steps are sufficient for the convergence. Nevertheless, the spectral scaling has given 
better convergence in all of our experiments with respect to the determinantal scaling. 

Test 8.2. Now we want to test the algorithms in some tough problems, we 
consider the identity matrix / and a diagonal matrix whose diagonal elements are 
equally spaced between 1 and t, for t > 1. We test the algorithms for the couple 
A = MM*, B = M DM* , whose matrix mean is MD 1 / 2 M*, and where M the 
Hilbcrt matrix which is a classical example of a very ill-conditioned matrix. The 
exact solution can be computed accurately since A#B — MD^I^M* , and thus the 
relative error gives a genuine measure of the accuracy of the algorithms. 

We use the same algorithms as Test 8.1 removing the one based on Gauss- 



Chebyshev quadrature, since it is much less efficient, and adding the Cholesky-Schur 
method (Algorithm 4.1) whose great stability guarantees the best forward error. In 
Figure |8.2| there is a comparison of methods for 5x5 matrices and for the values 
t = 10 2 and £ = 10 4 . In the case t = 10 2 the relative condition number, as defined in 
Section [3] is 1.5 • 10 6 and the lowest error (about 10 -9 ) is obtained by the Cholesky- 
Schur method, a similar accuracy with a lower computational cost is obtained by the 
polar decomposition, while the other algorithms seem to have more difficulties. The 
results are similar for t = 10 4 , the only difference is that now the convergence is slower 
and the conditioning is greater and thus the numerical results are poorer. 



Geometric Mean of two Matrices 



21 




Fig. 8.1. Comparison of the accuracy obtained at various steps by the Averaging iterations 
(AV), by the Averaging iteration with spectral scaling (AVs), by the polar decomposition algorithm 
(PD) and the accuracy for various number of nodes of the rational minimax approximation algorithm 
(MM) and of the Gauss-Chebyshev quadrature (GC) on Test \ 8. i\ for x = 10 (left) and x = 1000 
(right). 




12 14 




10 12 14 



Fig. 8.2. Comparison of the accuracy obtained at various steps by the Averaging algorithm 
(AV), by the Averaging iteration with spectral scaling (AVs), by polar decomposition algorithm (PD) 
and the accuracy for various number of nodes of the rati onal minimax approximation algorithm 
(MM) and the Cholesky-Schur method (Schur) on Test\8.2\with t = 10 2 (left) and t = 10 4 (right) 



What we have experimented in most of the tests is that, besides the Cholesky- 
Schur method, the polar decomposition method where the polar factor is computed 
by Newton's method with spectral scaling performs better than the other methods. 

Test 8.3. We want to address the stability issues related to the iterations pre- 



sented in Section 5.1 In fact, proving that the sequence (orbit) {-Yfe} obtained by 
a matrix iteration X^+i = G(Xk), with a given Xq, converges in exact arithmetic is 
not sufficient to guarantees the numerical convergence. This fact has been observed 
for the (simplified) Newton method for matrix roots and has been first explained by 
Higham [T^ . The reason of the numerical failure is that the limit of the iteration is 
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not a stable fixed point, in the sense that the derivative of G at the fixed point has 
spectral radius larger than one and thus there are points Y in any neighborhood of 
G such that G(Y) gets far from the fixed point. In finite arithmetic, rounding errors 
may cause a deviation from X k to a nearby point X k whose orbit diverge. 

We compute the derivative at AjfB of the iterations defining the averaging iter- 



ation (5.1) and its uncoupled variant (5.2 1 showing that the first has spectral radius 



less than one (and so it is stable) for any A and B, while the second has spectral 
radius greater than one for certain A and B. 

Define G(X, Y) = [(X + Y) /2 2X{X +Y)~ 1 Y] such that the averaging iteration 
is [A k +i B k+1 ] = G(A k ,B k ) then 

dG [x Y] [H,K] 

-(H+K) 2H(X+Y)- 1 Y-2X(X+Y)- 1 (H+K)(X+Y)- 1 Y + 2X(X + Y)- 1 K 



whose spectral radius 



from which we get in the vec basis dG[A#B a#b] = \ 
is one. This fact let us expect that a small perturbation on the iterates A k and B k of 



1 1 
1 1 



(5.1) near to the geometric mean is not amplified in the successive iterates. 

On the other hand, define F(X) = \(X + AX^B), then dF x [H] = \(H - 
AX~ 1 HX~ 1 B), in the vec basis we have 

dF A#B = 1(1- B(A#B)- 1 ® A(A#B)- 1 ) = i(J - Z ® zT 1 ), 

where Z = (BA^ 1 ) 1 / 2 . The eigenvalues of dFA#B are of the form |(1 — Xi/Xj) where 
Xi,Xj are any two eigenvalues of Z. Since the eigenvalues of Z are real we get that 
p(dFA#B) ^ 1 if ||1 — AM/A m | ^ 1, that is Ajvf/A„ 3, where Am and A m are the 
largest and the smallest eigenvalues of Z, respectively. Thus, we expect numerical 
instability for matrices A and B such that the quotient Ajvf/A„ is greater than 3. 

We consider the matrices of Test |8.2| with n = 5 and where the diagonal elements 
of D are logarithmically spaced between 1 and 10 - ', for t = 0.5 and t = 1.5. In the 
former case we get p(dFA#s) ~ 1.8 ^ 3, in the latter p(dFA#s) ~ 5.6 > 3, and in 



fact in the first case the uncoupled averaging iteration (5.2) performs stably, while in 
the second case it reveals instability. In both cases the standard averaging iteration 
is stable. The results are drawn in Figure |8.3| 

9. Conclusions. We have studied the computational issues related to the ma- 
trix geometric mean of two positive definite matrices A and B, from the conditioning 
to the classification of the numerical algorithms for computing A^B. We have an- 
alyzed many algorithms, most of which are new, or have not yet been considered in 
the literature. The algorithms are either based on the Schur decomposition or are 
iterations or quadrature formulae converging to the geometric mean. A very nice fact 
is that all iterations and quadrature formulae we were able to found were related to 
the two important rational approximation of z -1 / 2 , namely, the Pade approximation 
and the rational relative minimax approximation. 

We have observed that the Pade approximation requires a much high degree 
than the rational relative minimax approximation to get the same accuracy. On the 
other hand, the advantage of the Pade approximation is that there exists a recurrence 
relation between the [2 k , 2 k — 1] Pade approximants to z" 1 / 2 and this recurrence leads 
to a quadratically convergent algorithm which outperforms the one based on rational 
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Fig. 8.3. Iteration step vs. relative error f or th e averaging iteration and its uncoupled version 
for t = 0.5 (left) and t = 1.5 (right) as in Test \8.3\ In the second case the derivative of G(X) has 
spectral radius greater than one and the uncoupled iteration shows numerical instability. 



minimax approximation. The quadratically convergent iterations can be scaled to get 
very efficient algorithms, as the one based on the polar decomposition of a suitable 
matrix. 

Our preferred algorithms for computing the matrix geometric mean are the one 
based on the Schur decomposition, namely the Cholesky-Schur algorithm, and the 
ones based on the scaled averaging and scaled polar decomposition, although for large 
matrices it may be necessary to use a quadrature formula as the rational minimax 
approximation. A better understanding of the problem (A#B)v with A and B large 
and sparse matrices and v a vector is needed and is the topic of a future work. 

We wonder if some kind of recurrence could be found for the rational relative 
minimax approximation. Moreover, the algorithms based on the Pade approximation 
benefit considerably by the scaling technique. One might wonder what is the inter- 
pretation of the scaling in terms of the approximation and if it is possible to get a 
"scaled rational minimax" approximation in order to accelerate the convergence. 

Another issue is related to the equivalence of methods. For this problem we have 
found the equivalence between a Newton method, a Pade approximation, the Cyclic 
Reduction and a Gaussian quadrature. We wonder if this intimate connection is true 
in more general settings. For instance, it would be nice to see the Cyclic Reduction 
algorithm as a function approximation algorithm. 

Acknowledgments. The author wish to thank George Trapp who kindly sent 
him some classical papers about the matrix geometric mean and Elena Addis a student 
who defended a thesis on these topics and who gave the remarkable quote about the 
interpretation of the geometric mean as the mid-point of a geodesic: 
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